Statistics 651
Normal Sampling Model

Winter 2026

Objectives

  1. Perform analysis with the normal sampling model under various priors

Resources

  • BDA3: Ch 3.1-3.3
  • Hoff: Ch 5 (normal)

Joint vs marginal posterior

Joint posterior

\[ \pi(\boldsymbol{\theta} \mid \boldsymbol{x}) = \frac{f(\boldsymbol{x} \mid \boldsymbol{\theta}) \, \pi(\boldsymbol{\theta})}{\int_\boldsymbol{\Theta} f(\boldsymbol{x} \mid \boldsymbol{\theta}) \, \Pi(\mbox{d} \boldsymbol{\theta}) } \]

Marginal posterior

Splitting up the parameter \(\boldsymbol{\theta} = (\boldsymbol{\theta}_1, \boldsymbol{\theta}_2)\), we can write \[\begin{align} \pi(\boldsymbol{\theta}_1 \mid \boldsymbol{x}) &= \int_{\boldsymbol{\Theta}_2} \Pi( \boldsymbol{\theta}_1, \, \mbox{d} \boldsymbol{\theta}_2 \mid \boldsymbol{x}) \\ &= \int_{\boldsymbol{\Theta}_2} \pi( \boldsymbol{\theta}_1 \mid \boldsymbol{\theta}_2, \boldsymbol{x}) \, \Pi_2(\mbox{d} \boldsymbol{\theta}_2 \mid \boldsymbol{x}) \, . \end{align}\]

  • Remove nuisance parameters
  • Posterior summaries of \(\boldsymbol{\theta}_1\) are simpler, and \(\mathbb{E}(\boldsymbol{\theta}_1 \mid \boldsymbol{x}) = \mathbb{E}_{\Pi_2}(\mathbb{E}(\boldsymbol{\theta}_1 \mid \boldsymbol{\theta}_2, \boldsymbol{x}))\)

Normal sampling model with \(\mu\) and \(\sigma^2\) unknown

Normal model

Prior choices:

  • Noninformative
    • Use case: reference analysis
  • Fully conjugate
    • Use case: historical info available, computational convenience
  • Conditionally conjugate
    • Use case: info available, more flexibility needed
  • Alternate variance priors
    • Use case: more realistic, better properties

Normal model: likelihood

Assuming \(x_1, \ldots, x_n \overset{\text{iid}}{\sim} \mbox{Normal}(\mu, \sigma^2)\),


\(f(\boldsymbol{x} \mid \mu, \sigma^2) =\)

Normal model: “noninformative” prior

Normal model: “noninformative” prior

Noninformative prior

\(\pi(\mu, \sigma^2) \propto \frac{1}{\sigma^2}\)

This is equivalent to using \(\pi(\mu, \log\sigma) \propto 1\), and is the Jeffreys prior.

Then we can write \(\pi(\mu, \sigma^2 \mid \boldsymbol{x}) = \pi(\sigma^2 \mid \boldsymbol{x}) \times \pi(\mu \mid \sigma^2, \boldsymbol{x})\)

(See normal-inverse-gamma distribution)

where

\(\pi(\sigma^2 \mid \boldsymbol{x}) \propto\)

Normal model: “noninformative” prior

Prior: \(\pi(\mu, \sigma^2) \propto \frac{1}{\sigma^2}\)

Posterior: \(\pi(\mu, \sigma^2 \mid \boldsymbol{x}) = \pi(\sigma^2 \mid \boldsymbol{x}) \times \pi(\mu \mid \sigma^2, \boldsymbol{x})\)

with

\(\sigma^2 \mid \boldsymbol{x} \sim \mbox{Inv-Gamma}(\frac{n-1}{2}, \frac{(n-1)s^2}{2})\)

and

\(\pi(\mu \mid \sigma^2, \boldsymbol{x}) \propto\)

Normal model: “noninformative” prior

Prior: \(\pi(\mu, \sigma^2) \propto \frac{1}{\sigma^2}\)

Posterior: \(\pi(\mu, \sigma^2 \mid \boldsymbol{x}) = \pi(\sigma^2 \mid \boldsymbol{x}) \times \pi(\mu \mid \sigma^2, \boldsymbol{x})\)

with \[\begin{align} \sigma^2 \mid \boldsymbol{x} &\sim \mbox{Inv-Gamma}\left(\frac{n-1}{2}, \, \frac{(n-1)s^2}{2} \right) \, , \\ \mu \mid \sigma^2, \boldsymbol{x} &\sim \mbox{Normal}\left(\bar{x}, \, \frac{\sigma^2}{n} \right) \, . \end{align}\]

This suggests sequentially sampling \((\mu, \sigma^2)\) from the posterior with the compositional method.

Sampling from inverse gamma

Proposition

Gamma, Inv-Gamma relation

If \(X \sim \mbox{Gamma}(a, \, \texttt{rate}=b)\),

then \(Y = \frac{1}{X} \sim \mbox{Inv-Gamma}(a, \, \texttt{scale} = b)\)

  • We use scale paramter with inverse-gamma.
  • Suggests the following functions:
dinvgamma <- function(x, shape, scale) {
  dgamma(1.0 / x, shape = shape, rate = scale) / x^2
}

rinvgamma <- function(n, shape, scale) {
  1.0 / rgamma(n, shape = shape, rate = scale)
}

Nuisance parameters

Prior: \(\pi(\mu, \sigma^2) \propto \frac{1}{\sigma^2}\)

Posterior: \(\pi(\mu, \sigma^2 \mid \boldsymbol{x}) = \pi(\sigma^2 \mid \boldsymbol{x}) \times \pi(\mu \mid \sigma^2, \boldsymbol{x})\).

If interest lies only in \(\mu\), we can find

\(\pi(\mu \mid \boldsymbol{x}) = \int_0^\infty \pi(\sigma^2 \mid \boldsymbol{x}) \times \pi(\mu \mid \sigma^2, \boldsymbol{x}) \, \mbox{d}\sigma^2 \, ,\)


which is \(\frac{s}{\sqrt{n}} t_{n-1} + \bar{x}\),

where \(t_{\text{df}}\) denotes a Student-\(t\) distribution with \(\text{df}\) degrees of freedom.

Normal model: “noninformative” prior

Posterior predictive

\(f(\tilde{x} \mid \boldsymbol{x}) = \int_0^\infty \int_{-\infty}^\infty f(\tilde{x} \mid \mu, \sigma^2) \, \pi(\mu, \sigma^2 \mid \boldsymbol{x}) \, \mbox{d} \mu \, \mbox{d}\sigma^2\)

which is \[s\sqrt{1 + \frac{1}{n}} \times t_{n-1} + \bar{x} \, ,\]

that is, location-scale Student-\(t\) with \(n-1\) degrees of freedom.

Normal model: fully conjugate prior

Normal model: fully conjugate prior

Fully conjugater prior

\(\pi(\mu, \sigma^2) = \pi(\sigma^2) \times \pi( \mu \mid \sigma^2)\)

with \[\begin{align} \sigma^2 &\sim \mbox{Inv-Gamma}\left( \frac{n_0}{2}, \, \frac{n_0 s_0^2}{2}\right) \, , \\ \mu \mid \sigma^2 &\sim \mbox{Normal}\left(m_0, \, \frac{\sigma^2}{\kappa_0}\right) \, . \end{align}\]

Note that this is equivalent to the normal-inverse-gamma distribution with density

\(\pi(\mu, \sigma^2) \propto\)

Normal-inverse-gamma distribution

\(n_0 = 5\), \(s_0^2 = 1\), \(m_0 = 0\), \(\kappa_0 = 3\)

Normal model: fully conjugate prior

Prior \(\pi(\mu, \sigma^2) = \mbox{Inv-Gamma}\left(\sigma^2; \, \frac{n_0}{2}, \, \frac{n_0 s_0^2}{2} \right) \times \mbox{Normal}\left( \mu ; \, m_0, \, \frac{\sigma^2}{\kappa_0}\right)\)


\(n_0\):


\(s_0^2\):


\(m_0\):


\(\kappa_0\):

Normal model: fully conjugate prior

Posterior \(\pi(\mu, \sigma^2 \mid \boldsymbol{x}) = \mbox{Inv-Gamma}\left(\sigma^2; \, \frac{n_1}{2}, \, \frac{n_1 s_1^2}{2} \right) \times \mbox{Normal}\left( \mu ; \, m_1, \, \frac{\sigma^2}{\kappa_1}\right)\)


\(n_1 =\)


\(n_1 s_1^2 =\)


\(m_1 =\)


\(\kappa_1 =\)

Normal model: fully conjugate prior

Posterior \(\pi(\mu, \sigma^2 \mid \boldsymbol{x}) = \mbox{Inv-Gamma}\left(\sigma^2; \, \frac{n_1}{2}, \, \frac{n_1 s_1^2}{2} \right) \times \mbox{Normal}\left( \mu ; \, m_1, \, \frac{\sigma^2}{\kappa_1}\right)\)

  • As with the noninformative case, \((\mu, \sigma^2)\) can be sampled sequentially with the compositional method.

  • The marginal posterior of \(\mu\) is \(\sqrt{\frac{n_1 \, s_1^2}{n_1 \, \kappa_1}} \times t_{n_1} + m_1\),

that is, location-scale Student-\(t\) with \(n_1\) degrees of freedom.

Normal model: fully conjugate prior

Posterior predictive

\(f(\tilde{x} \mid \boldsymbol{x}) = \int_0^\infty \int_{-\infty}^\infty f(\tilde{x} \mid \mu, \sigma^2) \, \pi(\mu, \sigma^2 \mid \boldsymbol{x}) \, \mbox{d} \mu \, \mbox{d}\sigma^2\)

which is \[\sqrt{\left( \frac{n_1 \, s_1^2 (\kappa_1 + 1)}{n_1 \, \kappa_1} \right)} \times t_{n_1} + m_1 \, ,\]

that is, location-scale Student-\(t\) with \(n_1\) degrees of freedom.

Normal model: conditionally conjugate prior

Normal model: cond. conjugate prior

Conditionally conjugate prior

\(\pi(\mu, \sigma^2) = \mbox{Normal}(\mu; \, m_0, \, v_0) \times \mbox{Inv-Gamma}\left(\sigma^2; \, \frac{a_0}{2}, \, \frac{b_0}{2} \right)\)

Posterior complete (or full) conditionals

\(\mu \mid \sigma^2, \boldsymbol{x} \sim \mbox{Normal}(m_1, v_1)\) with \[v_1 = \frac{1}{\frac{1}{v_0} + \frac{n}{\sigma^2}} \quad \text{and} \quad m_1 = v_1 \times \left(\frac{1}{v_0} m_0 + \frac{n}{\sigma^2} \bar{x} \right) \, ,\]

and

\(\sigma^2 \mid \mu, \boldsymbol{x} \sim \mbox{Inv-Gamma}\left(\frac{a_1}{2}, \, \frac{b_1}{2} \right)\) with \[a_1 = a_0 + n \quad \text{and} \quad b_1 = b_0 + \sum_{i=1}^n (x_i - \mu)^2 \, .\]

Problems with inverse-gamma

See Gelman (2006), Polson and Scott (2012)

Code
a <- 2
b <- 3

dinvgamma <- function(x, a, b) {
  dgamma(1/x, shape = a, rate = b) / x^2
}

par(mar = c(4,4,0,0) + 0.1)
curve(dinvgamma(x, a = a, b = b), from = 0, to = 8, n = 1000, lwd = 2,
      xlab = expression(sigma^2), ylab = "density", cex.lab = 1.5, axes = FALSE)
axis(side = 1)

Normal model: alternate priors

Normal model: uniform scale

Uniform scale prior

\(\pi(\mu, \sigma) = \mbox{Normal}(\mu; \, m_0, \, v_0) \times \mbox{Uniform}(\sigma; \, 0, \, b_\sigma)\)

  • Uniform on the standard deviation rather than variance
  • Hard upper boundary on the scale (check for this in the posterior)
  • Implies \(\pi(\sigma^2) \propto (\sigma^2)^{-1/2} \, 1\{ 0 < \sigma^2 < b_\sigma^2 \}\)

Normal model: Half-Cauchy scale

Half-Cauchy scale prior

\(\pi(\mu, \sigma) = \mbox{Normal}(\mu; \, m_0, \, v_0) \times \mbox{Cauchy}^+(\sigma; \, s_0)\)

where

\(\mbox{Cauchy}^+(\sigma; \, s_0) = \frac{2}{\pi s_0} \left( 1 + \frac{\sigma^2}{s_0^2} \right)^{-1} \, 1\{ \sigma > 0 \}\)

is a scaled Student-\(t\) with 1 degree of freedom, truncated below at 0.

Normal model: Half-Cauchy scale

The prior \(\pi(\sigma) = \mbox{Cauchy}^+(\sigma; \, s_0)\) is equivalent to

\[\begin{align} \sigma^2 \mid \eta &\sim \mbox{Inv-Gamma}\left(\frac{1}{2}, \, \frac{1}{\eta}\right) \, , \\ \eta &\sim \mbox{Inv-Gamma}\left(\frac{1}{2}, \, \frac{1}{s_0^2}\right) \, . \end{align}\]

Complete conditional distributions:

\[\begin{align} \sigma^2 \mid \eta, \, \mu, \, \boldsymbol{x} &\sim \mbox{Inv-Gamma}\left(\frac{n+1}{2}, \, \frac{\sum_{i=1}^n (x_i - \mu)^2}{2} + \frac{1}{\eta} \right) \, , \\ \eta \mid \sigma^2, \, \mu, \, \boldsymbol{x} &\sim \mbox{Inv-Gamma}\left(1, \, \frac{1}{\sigma^2} + \frac{1}{s_0^2}\right) \, . \end{align}\]

Example

Zinc data

Example

Consider the brass alloy zinc data described in Section 1.2 of BIDA and available at https://blogs.oregonstate.edu/bida/data-sets-and-code/

Assume the model \(x_i \overset{\text{iid}}{\sim} \mbox{Normal}(\mu, \sigma^2)\) for \(i = 1, \ldots, n\) with \(n=12\) and perform an analysis with

  • the noninformative prior on \((\mu, \sigma^2)\)
  • the fully conjugate prior